144 research outputs found

    Improving Japanese Zero Pronoun Resolution by Global Word Sense Disambiguation

    Get PDF
    This paper proposes unsupervised word sense disambiguation based on automatically constructed case frames and its incorporation into our zero pronoun resolution system. The word sense disambiguation is applied to verbs and nouns. We consider that case frames define verb senses and semantic features in a thesaurus define noun senses, respectively, and perform sense disambiguation by selecting them based on case analysis. In addition, according to the one sense per discourse heuristic, the word sense disambiguation results are cached and applied globally to the subsequent words. We integrated this global word sense disambiguation into our zero pronoun resolution system, and conducted experiments of zero pronoun resolution on two different domain corpora. Both of the experimental results indicated the effectiveness of our approach.

    Flexibly Focusing on Supporting Facts, Using Bridge Links, and Jointly Training Specialized Modules for Multi-hop Question Answering

    Get PDF
    With the help of the detailed annotated question answering dataset HotpotQA, recent question answering models are trained to justify their predicted answers with supporting facts from context documents. Some related works train the same model to find supporting facts and answers jointly without having specialized models for each task. The others train separate models for each task, but do not use supporting facts effectively to find the answer; they either use only the predicted sentences and ignore the remaining context, or do not use them at all. Furthermore, while complex graph-based models consider the bridge/connection between documents in the multi-hop setting, simple BERT-based models usually drop it. We propose FlexibleFocusedReader (FFReader), a model that 1) Flexibly focuses on predicted supporting facts (SFs) without ignoring the important remaining context, 2) Focuses on the bridge between documents, despite not using graph architectures, and 3) Jointly learns predicting SFs and answering with two specialized models. Our model achieves consistent improvement over the baseline. In particular, we find that flexibly focusing on SFs is important, rather than ignoring remaining context or not using SFs at all for finding the answer. We also find that tagging the entity that links the documents at hand is very beneficial. Finally, we show that joint training is crucial for FFReader

    Fertilization of case frame dictionary for robust Japanese case analysis

    Full text link
    This paper proposes a method of fertilizing a Japanese case frame dictionary to handle com-plicated expressions: double nominative sen-tences, non-gapping relation of relative clauses, and case change. Our method is divided into two stages. In the first stage, we parse a large corpus and construct a Japanese case frame dic-tionary automatically from the parse results. In the second stage, we apply case analysis to the large corpus utilizing the constructed case frame dictionary, and upgrade the case frame dictio-nary by incorporating newly acquired informa-tion.

    Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese--Japanese Wikipedia

    Get PDF
    Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract either parallel sentences or fragments from them for SMT. In this article, we propose an integrated system to extract both parallel sentences and fragments from comparable corpora. We first apply parallel sentence extraction to identify parallel sentences from comparable sentences. We then extract parallel fragments from the comparable sentences. Parallel sentence extraction is based on a parallel sentence candidate filter and classifier for parallel sentence identification. We improve it by proposing a novel filtering strategy and three novel feature sets for classification. Previous studies have found it difficult to accurately extract parallel fragments from comparable sentences. We propose an accurate parallel fragment extraction method that uses an alignment model to locate the parallel fragment candidates and an accurate lexicon-based filter to identify the truly parallel fragments. A case study on the Chinese--Japanese Wikipedia indicates that our proposed methods outperform previously proposed methods, and the parallel data extracted by our system significantly improves SMT performance

    Building a Diverse Document Leads Corpus Annotated with Semantic Relations

    Get PDF
    corecore